离散状态空间代表了对统计推断的主要计算挑战,因为归一化常数的计算需要在大型或可能的无限集中进行求和,这可能是不切实际的。本文通过开发适合离散可怜的可能性的新型贝叶斯推理程序来解决这一计算挑战。受到连续数据的最新方法学进步的启发,主要思想是使用离散的Fisher Divergence更新有关模型参数的信念,以代替有问题的棘手的可能性。结果是可以使用标准计算工具(例如Markov Chain Monte Carlo)进行采样的广义后部,从而规避了棘手的归一化常数。分析了广义后验的统计特性,并具有足够的后验一致性和渐近正态性的条件。此外,提出了一种新颖的通用后代校准方法。应用程序在离散空间数据的晶格模型和计数数据的多元模型上介绍,在每种情况下,方法论都以低计算成本促进通用的贝叶斯推断。
translated by 谷歌翻译
广义贝叶斯推理使用损失函数而不是可能性的先前信仰更新,因此可以用于赋予鲁棒性,以防止可能的错误规范的可能性。在这里,我们认为广泛化的贝叶斯推论斯坦坦差异作为损失函数的损失,由应用程序的可能性含有难治性归一化常数。在这种情况下,斯坦因差异来避免归一化恒定的评估,并产生封闭形式或使用标准马尔可夫链蒙特卡罗的通用后出版物。在理论层面上,我们显示了一致性,渐近的正常性和偏见 - 稳健性,突出了这些物业如何受到斯坦因差异的选择。然后,我们提供关于一系列棘手分布的数值实验,包括基于内核的指数家庭模型和非高斯图形模型的应用。
translated by 谷歌翻译
贝叶斯神经网络试图将神经网络的强大预测性能与与贝叶斯架构预测产出相关的不确定性的正式量化相结合。然而,它仍然不清楚如何在升入网络的输出空间时,如何赋予网络的参数。提出了一种可能的解决方案,使用户能够为手头的任务提供适当的高斯过程协方差函数。我们的方法构造了网络参数的先前分配,称为ridgelet,它近似于网络的输出空间中的Posited高斯过程。与神经网络和高斯过程之间的连接的现有工作相比,我们的分析是非渐近的,提供有限的样本大小的错误界限。这建立了贝叶斯神经网络可以近似任何高斯过程,其协方差函数是足够规律的任何高斯过程。我们的实验评估仅限于概念验证,在那里我们证明ridgele先前可以在可以提供合适的高斯过程的回归问题之前出现非结构化。
translated by 谷歌翻译
We consider task allocation for multi-object transport using a multi-robot system, in which each robot selects one object among multiple objects with different and unknown weights. The existing centralized methods assume the number of robots and tasks to be fixed, which is inapplicable to scenarios that differ from the learning environment. Meanwhile, the existing distributed methods limit the minimum number of robots and tasks to a constant value, making them applicable to various numbers of robots and tasks. However, they cannot transport an object whose weight exceeds the load capacity of robots observing the object. To make it applicable to various numbers of robots and objects with different and unknown weights, we propose a framework using multi-agent reinforcement learning for task allocation. First, we introduce a structured policy model consisting of 1) predesigned dynamic task priorities with global communication and 2) a neural network-based distributed policy model that determines the timing for coordination. The distributed policy builds consensus on the high-priority object under local observations and selects cooperative or independent actions. Then, the policy is optimized by multi-agent reinforcement learning through trial and error. This structured policy of local learning and global communication makes our framework applicable to various numbers of robots and objects with different and unknown weights, as demonstrated by numerical simulations.
translated by 谷歌翻译
In this paper, we present a solution to a design problem of control strategies for multi-agent cooperative transport. Although existing learning-based methods assume that the number of agents is the same as that in the training environment, the number might differ in reality considering that the robots' batteries may completely discharge, or additional robots may be introduced to reduce the time required to complete a task. Therefore, it is crucial that the learned strategy be applicable to scenarios wherein the number of agents differs from that in the training environment. In this paper, we propose a novel multi-agent reinforcement learning framework of event-triggered communication and consensus-based control for distributed cooperative transport. The proposed policy model estimates the resultant force and torque in a consensus manner using the estimates of the resultant force and torque with the neighborhood agents. Moreover, it computes the control and communication inputs to determine when to communicate with the neighboring agents under local observations and estimates of the resultant force and torque. Therefore, the proposed framework can balance the control performance and communication savings in scenarios wherein the number of agents differs from that in the training environment. We confirm the effectiveness of our approach by using a maximum of eight and six robots in the simulations and experiments, respectively.
translated by 谷歌翻译
Humans demonstrate a variety of interesting behavioral characteristics when performing tasks, such as selecting between seemingly equivalent optimal actions, performing recovery actions when deviating from the optimal trajectory, or moderating actions in response to sensed risks. However, imitation learning, which attempts to teach robots to perform these same tasks from observations of human demonstrations, often fails to capture such behavior. Specifically, commonly used learning algorithms embody inherent contradictions between the learning assumptions (e.g., single optimal action) and actual human behavior (e.g., multiple optimal actions), thereby limiting robot generalizability, applicability, and demonstration feasibility. To address this, this paper proposes designing imitation learning algorithms with a focus on utilizing human behavioral characteristics, thereby embodying principles for capturing and exploiting actual demonstrator behavioral characteristics. This paper presents the first imitation learning framework, Bayesian Disturbance Injection (BDI), that typifies human behavioral characteristics by incorporating model flexibility, robustification, and risk sensitivity. Bayesian inference is used to learn flexible non-parametric multi-action policies, while simultaneously robustifying policies by injecting risk-sensitive disturbances to induce human recovery action and ensuring demonstration feasibility. Our method is evaluated through risk-sensitive simulations and real-robot experiments (e.g., table-sweep task, shaft-reach task and shaft-insertion task) using the UR5e 6-DOF robotic arm, to demonstrate the improved characterisation of behavior. Results show significant improvement in task performance, through improved flexibility, robustness as well as demonstration feasibility.
translated by 谷歌翻译
生成的对抗性模仿学习(GAIL)可以学习政策,而无需明确定义示威活动的奖励功能。盖尔有可能学习具有高维观测值的政策,例如图像。通过将Gail应用于真正的机器人,也许可以为清洗,折叠衣服,烹饪和清洁等日常活动获得机器人政策。但是,由于错误,人类示范数据通常是不完美的,这会降低由此产生的政策的表现。我们通过关注以下功能来解决此问题:1)许多机器人任务是目标任务,而2)在演示数据中标记此类目标状态相对容易。考虑到这些,本文提出了目标感知的生成对抗性模仿学习(GA-GAIL),该学习通过引入第二个歧视者来训练政策,以与指示演示数据的第一个歧视者并行区分目标状态。这扩展了一个标准的盖尔框架,即使通过促进实现目标状态的目标状态歧视者,甚至可以从不完美的演示中学习理想的政策。此外,GA-GAIL采用熵最大化的深层P-NETWORK(EDPN)作为发电机,该发电机考虑了策略更新中的平滑度和因果熵,以从两个歧视者中获得稳定的政策学习。我们提出的方法成功地应用于两项真正的布料操作任务:将手帕翻过来折叠衣服。我们确认它在没有特定特定任务奖励功能设计的情况下学习了布料操作政策。实际实验的视频可在https://youtu.be/h_nii2ooure上获得。
translated by 谷歌翻译
大量量化在线用户活动数据,例如每周网络搜索量,这些数据与几个查询和位置的相互影响共同进化,是一个重要的社交传感器。通过从此类数据中发现潜在的相互作用,即每个查询之间的生态系统和每个区域之间的影响流,可以准确预测未来的活动。但是,就数据数量和涵盖动力学的复杂模式而言,这是一个困难的问题。为了解决这个问题,我们提出了FluxCube,这是一种有效的采矿方法,可预测大量共同发展的在线用户活动并提供良好的解释性。我们的模型是两个数学模型的组合的扩展:一个反应扩散系统为建模局部群体之间的影响流和生态系统建模的框架提供了一个模拟每个查询之间的潜在相互作用。同样,通过利用物理知识的神经网络的概念,FluxCube可以共同获得从参数和高预测性能获得的高解释性。在实际数据集上进行的广泛实验表明,从预测准确性方面,FluxCube优于可比较的模型,而FluxCube中的每个组件都会有助于增强性能。然后,我们展示了一些案例研究,即FluxCube可以在查询和区域组之间提取有用的潜在相互作用。
translated by 谷歌翻译
用域随机化的深度强化学习在各种模拟中以随机物理和传感器模型参数学习了控制策略,以在零照片的环境中转移到现实世界。但是,由于策略更新的不稳定,当随机参数的范围广泛时,通常需要大量样本来学习有效的政策。为了减轻此问题,我们提出了一种名为环状策略蒸馏(CPD)的样品效率方法。 CPD将随机参数的范围分为几个小子域,并为每个子域分配局部策略。然后,在{\ it循环}将目标子域转变为相邻子域并使用单调策略改善方案来利用邻居子域的学习值/策略时,进行了本地策略的学习。最后,所有博学的本地政策都被蒸馏到SIM到现实转移的全球政策中。 CPD的有效性和样品效率通过四个任务(来自Mujoco的Openaigym和Pusher,游泳者和HalfCheetah的钟形)的模拟来证明,以及一项现实机器人球派遣任务。
translated by 谷歌翻译
定义和分离癌症亚型对于促进个性化治疗方式和患者预后至关重要。由于我们深入了解,子类型的定义一直在经常重新校准。在此重新校准期间,研究人员通常依靠癌症数据的聚类来提供直观的视觉参考,以揭示亚型的内在特征。聚集的数据通常是OMICS数据,例如与基本生物学机制有很强相关性的转录组学。但是,尽管现有的研究显示出令人鼓舞的结果,但它们却遭受了与OMICS数据相关的问题:样本稀缺性和高维度。因此,现有方法通常会施加不切实际的假设来从数据中提取有用的特征,同时避免过度拟合虚假相关性。在本文中,我们建议利用最近的强生成模型量化量化自动编码器(VQ-VAE),以解决数据问题并提取信息的潜在特征,这些特征对于后续聚类的质量至关重要,仅保留与重建有关的信息相关的信息输入。 VQ-VAE不会施加严格的假设,因此其潜在特征是输入的更好表示,能够使用任何主流群集方法产生出色的聚类性能。在包括10种不同癌症的多个数据集上进行的广泛实验和医学分析表明,VQ-VAE聚类结果可以显着,稳健地改善对普遍的亚型系统的预后。
translated by 谷歌翻译